What is claimed is: 


1 . A method of indexing a database of documents, comprising: 
providing a vocabulary of n terms; 

indexing the database in the form of a non-negative nxm index matrix V, 

5 wherein: 

m is equal to the number of documents in the database; 
n is equal to the number of terms used to represent the database; and 
the value of each element v y - of index matrix Fis a function of the number of occurrences 
of the i th vocabulary term in the j th document; 
1 0 factoring out non-negative matrix factors T and D such that 

V* TD; and 

wherein T is an n x r term matrix, D is an r x m document matrix, and r < 

nm/(n+m). 

2. The method of claim 1 further comprising deleting said index matrix V. 

15 3. The method of claim 2 further comprising deleting said term matrix T. 

4. The method of claim 1 wherein r is at least one order of magnitude 
smaller than n. 
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5. The method of claim 1 wherein r is from two to three orders of magnitude 
smaller than n. 

6. The method of claim 1 wherein entries of said document matrix D falling 
below a predetermined threshold value t are set to zero. 

5 7. The method of claim 2 wherein r is at least one order of magnitude 

smaller than n. 

8. The method of claim 2 wherein r is from two to three orders of magnitude 
smaller than n. 

9. The method of claim 2 wherein entries of said document matrix D falling 
10 below a predetermined threshold value t are set to zero. 

10. The method of claim 3 wherein r is at least one order of magnitude 
smaller than n. 

1 1 . The method of claim 3 wherein r is from two to three orders of magnitude 
smaller than n. 

15 12. The method of claim 3 wherein entries of said document matrix D falling 

below a predetermined threshold value / are set to zero. 
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13. The method of claim 1 wherein said factoring out of non-negative matrix 
factors T and D further comprises: 

selecting a cost function and associated update rules from the group: 


cost 


function ^ = Z ZKMro),-(2D)J associated wi 


with 


V.. T ik 
updaterules r » ^Ip^, r * *~ , and 2>, ^D^T y -^- 


n m 

cost function ^ = Z X 


associated with 


update rules D kJ <- D kJ t ^ D ^ and ^ ^ jj? D \ ? and 


cost function ||F - 7jD|| = Y,Y J Vij~i TD \f associated with update 
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and 


10 iteratively calculating said update rules so as to converge said cost 

function toward a limit until the distance between V and TD is reduced to or beyond a 
desired value. 


14. A program storage device readable by machine, tangibly embodying a 
program of instructions executable by the machine to perform method steps for indexing 
15 a database of documents, said method steps comprising: 

providing a vocabulary of n terms; 
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indexing the database in the form of a non-negative n x m index matrix F, 

wherein: 

m is equal to the number of documents in the database; 
n is equal to the number of terms used to represent the database; and 
5 the value of each element v y - of index matrix V is a function of the number 

of occurrences of the i th vocabulary term in the j* document; 

factoring out non-negative matrix factors Zand D such that 
F«7D;and 

wherein T is an n x r term matrix, D is an r x m document matrix, and r < 

10 nmliji+m). 


15. A database index, comprising: 

anrxm document matrix D, such that 
V*TD 

wherein T is an n x r term matrix; 
15 V is a non-negative n x m index matrix , wherein each of its m columns 

represents an j th document having n entries containing the value of a function of the 
number of occurrences of a i th term appearing in said j* document; and 

wherein T and D are non-negative matrix factors of V and r < nm/(n+m); 

and 

20 wherein each of the m columns of said document matrix D corresponds to 

saidj th document. 
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16. A method of information retrieval, comprising: 

providing a query comprising a plurality of search terms; 
providing a vocabulary of n terms; 

performing a first pass retrieval through a first database representation and 
5 scoring m retrieved documents according to relevance to said query; 

executing a second pass retrieval through a second database representation 
and scoring documents retrieved from said first pass retrieval so as to generate a final 
relevancy score for each document; and 

wherein said second database representation comprises mrxm document 
1 0 matrix D, such that 

V* TD 

wherein Tis an n x r term matrix; 

V is a non-negative n x m index matrix , wherein each of its m columns 
represents an j th document having n entries containing the value of a function of the 
15 number of occurrences of a i th term of said vocabulary appearing in said j th document; and 

wherein T and D are non-negative matrix factors of V and r < nm/(n+m); 

and 

wherein each of the m columns of said document matrix D corresponds to 
said j til document. 

20 17. The method of claim 16 wherein said final relevancy score for any j th 

document is a function of said j th document s corresponding entry in said document 
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matrix D and the corresponding entries in said document matrix D of the T top-scoring 
documents from said first pass retrieval 

18. The method of claim 17 wherein said relevancy score function for said j th 
document is proportional to a sum of cosine distances between said j th document s 

5 corresponding entry in said document matrix D and each of said corresponding entries in 
said document matrix D of the T top- scoring documents from said first pass retrieval. 

19. The method of claim 16 wherein r is at least one order of magnitude 
smaller than n. 

20. The method of claim 16 wherein r is from two to three orders of 
1 0 magnitude smaller than n. 

21. The method of claim 16 wherein entries of said document matrix D falling 
below a predetermined threshold value t are set to zero. 

22. A program storage device readable by machine, tangibly embodying a 
program of instructions executable by the machine to perform method steps for 

15 information retrieval, said method steps comprising: 

providing a query comprising a plurality of search terms; 
providing a vocabulary of n terms; 
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performing a first pass retrieval through a first database representation and 
scoring m retrieved documents according to relevance to said query; 

executing a second pass retrieval through a second database representation 
and scoring documents retrieved from said first pass retrieval so as to generate a final 
relevancy score for each document; and 

wherein said second database representation comprises an r x m document 
matrix D, such that 

V*TD 

wherein ris an n x r term matrix; 

V is a non-negative n x m index matrix , wherein each of its m columns 
represents an j th document having n entries containing the value of a function of the 
number of occurrences of a 1 th term of said vocabulary appearing in said f 1 document; and 

wherein T and D are non-negative matrix factors of V and r < nml{n+m)\ 

and 

wherein each of the m columns of said document matrix D corresponds to 
said j th document. 
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